Plastome Evolution, Phylogenomics, and DNA Barcoding Investigation of Gastrochilus (Aeridinae, Orchidaceae), with a Focus on the Systematic Position of Haraella retrocalla

Gastrochilus is an orchid genus containing about 70 species in tropical and subtropical Asia with high morphological diversity. The phylogenetic relationships among this genus have not been fully resolved, and the plastome evolution has not been investigated either. In this study, five plastomes of Gastrochilus were newly reported, and sixteen plastomes of Gastrochilus were used to conduct comparative and phylogenetic analyses. Our results showed that the Gastrochilus plastomes ranged from 146,183 to 148,666 bp, with a GC content of 36.7–36.9%. There were 120 genes annotated, consisting of 74 protein-coding genes, 38 tRNA genes, and 8 rRNA genes. No contraction and expansion of IR borders, gene rearrangements, or inversions were detected. Additionally, the repeat sequences and codon usage bias of Gastrochilus plastomes were highly conserved. Twenty hypervariable regions were selected as potential DNA barcodes. The phylogenetic relationships within Gastrochilus were well resolved based on the whole plastome, especially among main clades. Furthermore, both molecular and morphological data strongly supported Haraella retrocalla as a member of Gastrochilus (G. retrocallus).


Introduction
Gastrochilus D. Don (Aeridinae, Vandeae, Epidendroideae, Orchidaceae) is an epiphytic orchid genus comprising about 70 species, and widely distributed in tropical and subtropical Asia [1][2][3][4], with a species diversity center in the South-East Asian archipelago [5,6].Because of its high morphological diversity and brightly colored flowers, it has potential horticultural value for pot culture, hanging baskets, and tree mounting [2,7].Additionally, it can be used as a medicine to treat mastitis, body aches, and detoxification due to its phytochemical production such as bioactive alkaloids [2,8].
Recently, phylogenetic analyses confirmed that the genus Gastrochilus was monophyletic, but the infrageneric relationships were not completely resolved.Based on ITS and four plastid markers (atpI-atpH, matK, psbA-trnH, and trnL-F), Zou et al. [9] showed that the nine Gastrochilus species formed a clade with high Bayesian inference supporting value, and similar results were obtained in other phylogenetic studies with ITS and several chloroplast DNA markers.Unfortunately, the relationships within Gastrochilus were not consistent with each other [3,4,6].Specifically, Liu et al. [6] proposed that Gastrochilus can be divided into five clades based on five DNA regions (ITS, matK, psbA-trnH, psbM-trnD, and trnL-F).Recently, Zhang et al. [4] also employed the same five markers and divided Gastrochilus into six sections.However, these infrageneric classifications of this genus were not supported by Li et al. [3] based on the combination of ITS and seven chloroplast DNA markers.Further, Liu et al. [10] reconstructed the phylogenetic relationships within the Cleisostoma-Gastrochilus clades (Aeridinae) based on 68 plastid genes, and strongly supported the idea that Gastrochilus is close to Pomatocalpa.Particularly, the position of Gastrochilus retrocallus (Hayata) Hayata (Haraella retrocalla (Hayata) Kudo) was controversial.G. retrocallus is an epiphytic species endemic to Taiwan, China.It was firstly described as Saccolabium retrocallum by Hayata [11], and then recognized as a member of Gastrochilus as G. retrocallus [12].Because this species lacks a saccate hypochile, Kudo [13] established a new genus, Haraella, including two species H. retrocalla and H. odorata Kudo.Later, Smith [14] transferred H. odorata into Gastrochilus as G. odoratus (Kudo) J.J.Sm.After a detailed morphological examination, Tsi [1] revised the genus Gastrochilus and treated G. odoratus and G. retrocallus as synonyms of H. retrocalla.However, this taxonomic treatment has not been adopted by some taxonomists, and they still recognized them as G. retrocallus (e.g., [2,15], https://powo.science.kew.org/,accessed on 7 June 2024).Additionally, recent phylogenetic studies based on combined ITS and plastid DNA markers have still not effectively addressed its systematic position.For example, Zou et al. [9] indicated that H. retrocalla was sister to Gastrochilus with low supporting values (PP BI = 0.65, BS ML < 50), while Liu et al. [6] deemed it nested in Pomatocalpa (PP BI = 1.00).Therefore, it is crucial to employ more effective molecular markers to resolve the phylogenetic relationships within Gastrochilus, and further clarify the taxonomic position of H. retrocalla.
In this study, we newly sequenced, assembled, and annotated the plastomes of five Gastrochilus species, and combined them with 11 previously reported plastomes of Gastrochilus to conduct comparative analyses, DNA barcoding investigation, and phylogenetic reconstruction within Gastrochilus.Our objectives were (1) to investigate general features and understand the evolutionary pattern of Gastrochilus plastomes; (2) to identify some hypervariable regions as potential DNA barcodes for future species identification within Gastrochilus; and (3) to explore the phylogenetic relationships of Gastrochilus and discuss the systematic position of H. retrocalla.

Plastome Structural Variations
The IR boundary map was generated by comparing the plastomes of 16 Gastrochilus species using IRscope (Figure 2).At the junction between LSC and IRb (JLB), the rpl22 gene in all species spanned from LSC to IRb with 31-44 bp departed in IRb.Moreover, the trnN and rpl32 genes of Gastrochilus plastomes were found adjacent to the junction between the SSC and IRb (JSB), while none of them spanned the junction.In addition, the ycf1 gene spanned from SSC to IRa in 15 plastomes at the junction between SSC and IRa (JSA), with a range of 11 to 195 bp, while the ycf1 gene of G. guangtungensis Z.H.Tsi was entirely located in the SSC region.As for the junction between IRa and LSC (JLA), the rps19 and psbA genes were detected on the left and right side of the JLA line, respectively.The collinearity analysis revealed no gene rearrangements or inversions in the Gastrochilus plastomes (Figure S1).

Plastome Structural Variations
The IR boundary map was generated by comparing the plastomes of 16 Gastrochilus species using IRscope (Figure 2).At the junction between LSC and IRb (JLB), the rpl22 gene in all species spanned from LSC to IRb with 31-44 bp departed in IRb.Moreover, the trnN and rpl32 genes of Gastrochilus plastomes were found adjacent to the junction between the SSC and IRb (JSB), while none of them spanned the junction.In addition, the ycf1 gene spanned from SSC to IRa in 15 plastomes at the junction between SSC and IRa (JSA), with a range of 11 to 195 bp, while the ycf1 gene of G. guangtungensis Z.H.Tsi was entirely located in the SSC region.As for the junction between IRa and LSC (JLA), the rps19 and psbA genes were detected on the left and right side of the JLA line, respectively.The collinearity analysis revealed no gene rearrangements or inversions in the Gastrochilus plastomes (Figure S1).
A total of 706 tandem repeats were detected, ranging from 30 (G. japonicus) to 53 (G.retrocallus).There were 626 long repeats in the Gastrochilus plastomes, comprising four types: palindrome, forward, reverse, and complement.Among them, palindrome repeats were the dominant type of long repeats, followed by forward repeats, with percentages of 65.81% and 30.19%, respectively, while reverse and complement repeats only accounted for only 3.51% and 0.48%, respectively.All types of long repeats were detected only within two species (G.distichus (Lindl.)Kuntze and G. prionophyllus).The total number of long repeats ranged from 31 (G.japonicus) to 51 (G.retrocallus) in each Gastrochilus plastome.A total of 706 tandem repeats were detected, ranging from 30 (G. japonicus) to 53 (G.retrocallus).There were 626 long repeats in the Gastrochilus plastomes, comprising four types: palindrome, forward, reverse, and complement.Among them, palindrome repeats were the dominant type of long repeats, followed by forward repeats, with percentages of 65.81% and 30.19%, respectively, while reverse and complement repeats only accounted for only 3.51% and 0.48%, respectively.All types of long repeats were detected only within two species (G.distichus (Lindl.)Kuntze and G. prionophyllus).The total number of long repeats ranged from 31 (G.japonicus) to 51 (G.retrocallus) in each Gastrochilus plastome.
To quantify the degree of the codon usage bias, we estimated the relative synonymous codon usage (RSCU) ratio of 16 Gastrochilus plastomes using CodonW, which is visualized in Figure S2.There were 30 preferred codons (RSCU > 1), 2 non-preferred codons (RSCU = 1), and 32 less frequently used codons (RSCU < 1).Most of the preferred codons typically ended with A or U, except for UUG.Moreover, leucine (Leu, encoded by UUA, UUG, CUU, CUC, CUA, and CUG) was the most frequently encoded amino acid, while cysteine (Cys, encoded by UGU and UGC) had the lowest frequency.The codons AGA and UUA exhibited the highest RSCU values, with average values of 1.92 and 1.89, respectively, while the codons CGC and CGG had the lowest RSCU values, with average values of 0.31 and 0.34, respectively.

Plastome Sequence Divergence and Barcoding Investigation
The divergence of the complete plastome sequences among the 16 Gastrochilus species was analyzed using the mVISTA with Pomatocalpa spicatum Breda as reference (Figure S3).The whole genome alignment revealed that sequence variations in the conserved noncoding regions (CNS; colored in pink bars) were greater than that in the protein-coding regions (exon; colored in purple bars).The variation rates of both coding regions and non-coding regions in the two IR regions were lower than those in the LSC and SSC regions.Additionally, the non-coding intergenic regions were highly divergent, such as trnS GCU -trnG GCC , rpl32-trnL UAG , and psaC-rps15, while the rRNA genes were highly conserved compared with other genes.

Phylogenomic Analysis
The topologies based on the whole plastome (excluding IRa) and 68 CDSs were basically concordant.BI and ML analyses also yielded nearly identical topologies, with some differences in the supporting values of certain nodes (Figure 5A,B).The species of Gastrochilus formed a well-supported monophyletic group (PP BI = 1.00,BS ML = 100), which was revealed as a sister to Pomatocalpa.The G. retrocallus (formerly treated as Haraella retrocalla) diverged firstly as clade I, and the remaining species of Gastrochilus could be divided into three monophyletic clades with strong supporting values (PP BI = 1.00,BS ML = 100).Specifically, G. gongshanensis Z.H.Tsi and G. obliquus (Lindl.)Kuntze formed clade II.Clade III consisted of a pair of sister groups with strong supporting values (PP BI = 1.00,BS ML = 100): one included G. formosanus (Hayata) Hayata, G. sinensis, and G. distichus, and the other included G. acinacifolius and G. guangtungensis.Finally, the remaining species formed clade IV, which also included two monophyletic subclades with strong supporting values (PP BI = 1.00,BS ML = 100).However, clade II and III formed as sister groups with weaker support based on the two datasets.

Phylogenomic Analysis
The topologies based on the whole plastome (excluding IRa) and 68 CDSs were basically concordant.BI and ML analyses also yielded nearly identical topologies, with some differences in the supporting values of certain nodes (Figure 5A,B).The species of Gastrochilus formed a well-supported monophyletic group (PPBI = 1.00,BSML = 100), which was revealed as a sister to Pomatocalpa.In addition, to test the resolution of potential DNA barcodes for phylogenetic analyses, we reconstructed the phylogenetic relationships of Gastrochilus based on the top 10 hypervariable regions in the whole plastome and 68 CDSs.The two phylogenetic trees presented the same topologies (Figure 5C,D).All sampled Gastrochilus species formed a monophyletic group with high support values (PP BI = 0.98 and 1, BS ML = 85 and 98, respectively).Remarkably, the monophyly of clades II, III, and IV was strongly supported (PP BI = 1.00,BS ML = 100), while the relationships among the four clades were poorly resolved.
yses, we reconstructed the phylogenetic relationships of Gastrochilus based on the top 10 hypervariable regions in the whole plastome and 68 CDSs.The two phylogenetic trees presented the same topologies (Figure 5C,D).All sampled Gastrochilus species formed a monophyletic group with high support values (PPBI = 0.98 and 1, BSML = 85 and 98, respectively).Remarkably, the monophyly of clades II, III, and IV was strongly supported (PPBI = 1.00,BSML = 100), while the relationships among the four clades were poorly resolved.

Plastome Evolution within Gastrochilus
In this study, we firstly reported five Gastrochilus plastomes and provided genetic resources for understanding the evolution of plastomes in this group.All Gastrochilus plastomes had a typical quadripartite structure (Figure 1), consisting of one LSC region, one SSC region, and two IR regions, which were similar to the other orchids and most of the angiosperms (e.g., [21,24]).Limited variation in plastome size was detected among Gastrochilus species: G. fuscopunctatus possessed the smallest plastome at 146,183 bp, and G. acinacifolius had the largest at 148,666 bp.Plastome size falls within the previously
No visible gene rearrangement was detected among Gastrochilus plastomes (Figure S1), which was also observed in other orchid genera (e.g., Aerides [20]; Epidendrum [23]).In addition, our results revealed that all IR boundaries were conserved without distinct contraction or expansion (Figure 2).We only observed a slight difference in the JSA boundary regions of Gastrochilus plastomes.In most Gastrochilus plastomes, the ycf1 gene extended from SSC to IRa for 11-195 bp except G. guangtungensis.Similarly, the ycf1 gene in the SSC region of other orchids (such as Epidendrum [23] and Pholidota [22]) was also observed crossing over JSA, extending into the IRa region.Our results also indicated that codon usage bias was highly conserved among 16 Gastrochilus plastomes (Figure S2), which was consistent with previous studies of codon preference in Orchidaceae (e.g., [21,23,30]), and further demonstrates the high level of plastome conservation in Gastrochilus.
In addition, our results visually showed that species with closer phylogenetic relationships tend to have more similar plastome structures and sequence divergence patterns (Figures 2 and S3).For example, the whole ycf1 gene of G. guangtungensis was located in SSC, and ycf1 in its sister species only extended from SSC to IRa for 11 bp, while the three species with the longest span (195 bp) of ycf1 formed a subclade (Figure 2).Additionally, unique sequence divergence at about 99.5 kb and 132 kb only appeared in clade II, and the species with significant sequence divergence at about 54 kb and 60.5 kb formed as a monophyly (Figure S3).Similar phenomena were also observed in other studies, such as Chiloschista [28], Pholidota [22], and Epidendrum [23].Therefore, we speculated that the structure and sequence divergence of plastomes may also contain important evolutionary information.

Genetic Molecular Markers
SSRs were often used as genetic molecular markers in phylogenetic studies of closely related species (e.g., [31][32][33]).In this study, a total of 763 SSRs were identified in the plastomes of 16 Gastrochilus species, with 70.90% of them being mono-nucleotide repeats.A/T SSRs were found to be more abundant compared to G/C SSRs (Figure 3; Table S2), which may result from a bias towards A/T in plastomes [30].Most di-to hexa-nucleotide SSRs among Gastrochilus species were specific to each species (Figure 3).These SSRs were widely distributed through the plastome, and more than half of SSRs (62.91%) were located in the LSC region, which is similar to other angiosperm plastomes (e.g., [24,34]).The diversity and richness of SSR types vary across different species and may be attributed to the genetic variations among species [30].
Hypervariable regions explored for phylogenetic and identification analyses have been reported in Orchidaceae [36,37], and many orchid lineages such as Cleisostoma-Gastrochilus clades [10] and Chiloschista [28].In this study, the phylogenetic relationships solved based on ten hypervariable regions of the whole plastome and CDSs were nearly the same as those based on the whole plastome and 68 CDSs (Figure 5); therefore, we propose that the top ten hypervariable regions of the whole plastome and CDSs might be powerful markers for the phylogenetic analysis of Gastrochilus.

The Systematic Position of Haraella retrocalla and Phylogenomics of Gastrochilus
The phylogenetic results based on the whole plastome (excluding IRa), 68 CDSs, and ten hypervariable regions strongly supported that H. retrocalla (G.retrocallus) was grouped with Gastrochilus species with high supporting values (Figure 5).Especially the phylogenetic result of the whole plastome supported well the idea that H. retrocalla was a sister to Gastrochilus.H. retrocalla was previously recognized as a member of Haraella [1,5,13], or Gastrochilus [2,12,15].In this study, 14 morphological characters (representing stem, leaf, and flower) of 42 Gastrochilus species, H. retrocalla, and three Pomatocalpa species were analyzed (Table S3).Except for the absence of saccate hypochile in Haraella, H. retrocalla and Gastrochilus species are very similar in stem, leaf, and flower size, as well as in the number of flowers in one inflorescence.Additionally, principal component analysis (PCA) also revealed that H. retrocalla has no differentiation from Gastrochilus species, but is obviously distinct from Pomatocalpa (Figure S4).Therefore, our results supported the inclusion of H. retrocalla into Gastrochilus as G. retrocallus based on morphological and molecular evidence.
All four phylogenetic results of Gastrochilus strongly supported the monophyly of Gastrochilus species (Figure 5).G. retrocallus was sister to other Gastrochilus species with high supporting values based on the whole plastome.The phylogenetic relationships of Gastrochilus were better resolved based on the whole plastome than other datasets.In addition, the monophyly of the other three clades were fully supported (PP BI = 1.00,BS ML = 100%) based on the whole plastome and 68 CDSs, which was consistent with the results in Liu et al. [10] based on 68 CDSs.Moreover, the relationships between the three clades were not completely resolved, which may be due to the rapid divergence of Gastrochilus during the late Miocene [3].Many studies indicated that the phylogenetic relationships of recent radiation plant lineages, such as Rhododendron [38], Astragalus [39], and Acacia, were not clear [40].Therefore, we speculated that more samplings and more molecular data (such as mitogenome and transcriptomes) are needed to better understand the phylogenetic relationships within Gastrochilus.

Sampling and Sequencing
In this study, five new plastomes of Gastrochilus species were obtained, including G. acinacifolius, G. distichus, G. malipoensis X.H.Jin & S.C.Chen, G. prionophyllus, and G. yunnanensis Schltr.Another eleven published plastomes of Gastrochilus were downloaded from GenBank and the annotations in Geneious v9.1.4[41] were manually updated.Additionally, two species of the Gastrochilus clade in Aeridinae (Pomatocalpa spicatum and Trichoglottis philippinensis Lindl.) were selected as outgroups based on Liu et al. [10].The detailed information of the samples is listed in Table S4.
Leaf samples of five new sampled Gastrochilus species were cultivated and obtained from the Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Yunnan.We extracted total DNA from silica gel-dried leaves using the modified CTAB method [42].Library construction was performed with the NEB Next ® Ultra DNA Library Prep Kit (NEB, Ipswich, MA, USA), and libraries for paired-end 150 bp sequencing were created using an Illumina HiSeq 2000 platform at the Kunming Institute of Botany, Chinese Academy of Sciences (Yunnan, China).Finally, approximately 4 Gb of data were obtained for each species.

Structure and Sequence Divergence Analyses
To evaluate the possible expansion and contraction of the IR boundary, the genes on the boundary regions of LSC/IRb/SSC/IRa were visualized using IRscope v3.1 [47].Moreover, to detect the gene arrangement, Mauve v1.1.3[48] plugin in Geneious v9.1.4[41] was used to conduct the collinearity analysis with default parameters.The online program mVISTA [49] was used to analyze the sequence divergence of Gastrochilus plastomes using the Pomatocalpa spicatum (MN124411) plastome as a reference.

Evolutionary Hotspots and Phylogenetic Analyses
In order to avoid the impact of two IR regions on phylogenetic reconstruction, we identified the hypervariable regions and conducted the phylogenetic analyses with IRa excluded.We identified the hypervariable regions of 16 Gastrochilus species based on the following two matrices: (1) whole plastomes with IRa excluded and (2) 68 CDSs.The two matrices were aligned using MAFFT v7 [59] and manually adjusted in BioEdit v7.0 [60].We further evaluated the Pi value using DnaSP v6.12.03 [61] with sliding window analysis by setting step size to 200 bp and window length to 800 bp.

Morphological Character Analysis
To test whether morphological differentiation corroborates the phylogenetic relationship of H. retrocalla, we collected 14 morphological traits of stem, leaves, and flowers that were considered taxonomically important in systematic studies of Gastrochilus.All morphological traits were collected from existing studies (e.g., [1,5,65,66]).Then, we conduct PCA in R package "vegan" [67] to delimitate genus boundaries, included the morphological data from H. retrocalla, 42 species of Gastrochilus, and 3 species of Pomatocalpa, which is the sister group of Gastrochilus [10].In this analysis, the first two coordinates were selected to draw the PCA scatter plot.The 14 morphological traits of 46 species are provided in Table S3.

Figure 1 .
Figure 1.Plastome structure of five Gastrochilus species.The darker gray in the inner circle corresponds to the GC content.Bars of different colors indicate different functional groups.Genes on the inside of the circle are transcribed clockwise, while genes annotated outside the circle are transcribed counterclockwise.LSC: large single-copy region; SSC: small single-copy region; IRa and IRb: two inverted repeat regions.

Figure 1 .
Figure 1.Plastome structure of five Gastrochilus species.The darker gray in the inner circle corresponds to the GC content.Bars of different colors indicate different functional groups.Genes on the inside of the circle are transcribed clockwise, while genes annotated outside the circle are transcribed counterclockwise.LSC: large single-copy region; SSC: small single-copy region; IRa and IRb: two inverted repeat regions.

Figure 2 .
Figure 2. Comparison of the boundaries between the LSC, SSC, and IR regions in the sixteen Gastrochilus plastomes.The topology on the left was the ML tree based on plastomes (excluding IRa).JLB: LSC/IRb junctions; JSB: SSC/IRb junctions; JSA: SSC/IRa junctions; JLA: LSC/IRa junctions.

Figure 2 .
Figure 2. Comparison of the boundaries between the LSC, SSC, and IR regions in the sixteen Gastrochilus plastomes.The topology on the left was the ML tree based on plastomes (excluding IRa).JLB: LSC/IRb junctions; JSB: SSC/IRb junctions; JSA: SSC/IRa junctions; JLA: LSC/IRa junctions.

Figure 3 .
Figure 3. Plot of each SSR repeat pattern number of 16 Gastrochilus plastomes.The topology on the left is the ML tree based on plastomes (excluding IRa).

Figure 3 .
Figure 3. Plot of each SSR repeat pattern number of 16 Gastrochilus plastomes.The topology on the left is the ML tree based on plastomes (excluding IRa).

Figure 4 .
Figure 4. Sliding window analysis of nucleotide diversity for Gastrochilus plastomes.(A) The nucleotide diversity of the whole plastome (excluding IRa).(B) The nucleotide diversity of 68 protein coding sequences.Top ten hypervariable regions of the two datasets were annotated respectively.
The G. retrocallus (formerly treated as Haraella retrocalla) diverged firstly as clade I, and the remaining species of Gastrochilus could be divided into three monophyletic clades with strong supporting values (PPBI = 1.00,BSML = 100).Specifically, G. gongshanensis Z.H.Tsi and G. obliquus (Lindl.)Kuntze formed clade II.Clade III consisted of a pair of sister groups with strong supporting values (PPBI = 1.00,BSML = 100): one included G. formosanus (Hayata) Hayata, G. sinensis, and G. distichus, and the other included G. acinacifolius and G. guangtungensis.Finally, the remaining species formed clade IV, which also included two monophyletic subclades with strong supporting values (PPBI = 1.00,BSML = 100).However, clade II and III formed as sister groups with weaker support based on the two datasets.

Figure 4 .
Figure 4. Sliding window analysis of nucleotide diversity for Gastrochilus plastomes.(A) The nucleotide diversity of the whole plastome (excluding IRa).(B) The nucleotide diversity of 68 protein coding sequences.Top ten hypervariable regions of the two datasets were annotated respectively.

Figure 5 .
Figure 5. Comparisons of phylogenetic tree topologies for four datasets based on BI and ML analyses in Gastrochilus species.(A) Whole plastomes (excluding IRa).(B) Sixty-eight CDSs.(C) Top ten hypervariable regions of whole plastomes.(D) Top ten hypervariable regions of 68 CDSs.Numbers above the branches are Bayesian posterior probabilities and ML bootstrap values, respectively.A dash (-) indicates that the supporting values are less than 50%.

Figure 5 .
Figure 5. Comparisons of phylogenetic tree topologies for four datasets based on BI and ML analyses in Gastrochilus species.(A) Whole plastomes (excluding IRa).(B) Sixty-eight CDSs.(C) Top ten hypervariable regions of whole plastomes.(D) Top ten hypervariable regions of 68 CDSs.Numbers above the branches are Bayesian posterior probabilities and ML bootstrap values, respectively.A dash (-) indicates that the supporting values are less than 50%.